Audience:
Style:
Friends of the Earth (FoE) have recently released a report focused on “England’s Green Space Gap.” The headline finding of the report is that one in five people in England live in areas where it is difficult to access green space . The report also provides a holistic overview of why green space is so important, by highlighting how individuals and communities benefit from having access to both public and private green space. These benefits which stretch far beyond the natural environmental itself, and encompass a myriad of social, health and economic benefits.
As part of the research underpinning the Green Space Gap report, Friends of the Earth have developed a new approach for classifying the extent to which neighborhoods (or Middle Super Output Areas in the terminology of the administrative geography) across England experience green space deprivation. Neighborhoods are classified into five groups; with group A including the least green space deprived neighborhoods, and E including the most green space deprived.
Friends of Earth have released the dataset that they developed and used to classify green space deprivation within the Green Space Gap report. In this notebook I plan to conduct an exploratory data analysis using this Friends of the Earth dataset. Before doing so, I think it might be helpful to outline the way in which Friends of the Earth processed the dataset. This is outlined in the figure below and incorporated the following steps:
Producing the Friends of the Earth Green Space Deprivation ratings.
Accessing two Public Datasets from the ONS detailing: (1) the amount garden space and the accessibility of public green spaces (parks etc.); and (2), the extent of various forms of deprivation.
Identifying Underlying Variables of interest within these two datasets including for example the amounts of different forms of public green space.
Creating three Summary Variables by processing the Underlying Variables. For example, calculating the percentage of population within each neighbourhood within 5 minutes walk of public green space.
Calculating green space Scoresfrom each of the three Summary Variables.
Classifying each neighbourhood based on its green space Scoresinto one of five green space deprivation Overall Ratings.
Full details of the methodology used by Friends of the Earth can be found on page 36 of the Green Space Gap report.
n.b. In the report, Friends of the Earth draw on the Index of Multiple Deprivation (IMD) dataset to explore the relationship between the green space deprivation ratings and demographic factors including ethnicity and income.
Reading the Green Space Gap report and exploring the associated dataset, I was struck by a number of questions about the nature and scope of green space deprivation in England. I thought that these questions might be a good basis for my exploratory data analysis.
What is the scale of the green space deprivation problem in England?
Is green space deprivation an urban problem?
How is green space deprivation distributed across regions in England?
What can the dataset tell us about what green space deprivation looks like in England?
Below I address each of these questions in turn with the aim of extending upon the analysis of the data presented in the report. By doing so, I hope to contribute to the wider debate on maintaining and extending access to green space during the post-covid recovery.
Ahead of moving on to the exploratory data analysis itself, I thought it would be helpful to very briefly document the datasets I used. This includes the Friends of the Earth dataset, and additional datasets from ONS which proved interesting or helpful in the context of my exploratory data analysis. In particular, I thought it was recording the versions of the dataset used where multiple version are available from the ONS
| variable_name | file_name | notes | url |
|---|---|---|---|
| green_space | (FOE) Green Space Consolidated Data - England - Version 2.1.xlsx | … | https://friendsoftheearth.uk/nature/green-space-consolidated-data-england |
| LAD_to_region | Local_Authority_District_to_Region__December_2019__Lookup_in_England.csv | used the December 2019 version | https://geoportal.statistics.gov.uk/datasets/3ba3daf9278f47daba0f561889c3521a_0 |
| urban_rural_classification | RUC_MSOA_2001_EW_LU.csv | 2001 was the latest version available | https://geoportal.statistics.gov.uk/datasets/rural-urban-classification-2001-of-msoas-in-england-and-wales |
Ahead of conducting the exploratory data analysis I imported the three datasets and then merged into the single dataframe shown below. I have retain all the variables from the Friends of the Earth dataset in this dataframe, including non-green space variables from the Indices of Multiple Deprivation, in case they prove useful later in the analysis.
This exploratory data analysis begins by focusing on the green space deprivation ratings of each English neighborhood. In this section of the analysis, I do not drill down into the underlying data that informs the ratings. More detailed exploration of the underlying data is picked up in the later sections of this notebook. But initially I wanted to get a better understanding of the green space deprivation ratings themselves, and their potential implications.
The first question I turn to is how many neighborhoods are considered green space deprived in the Friends of the Earth Analysis? The plot below shows the numbers of neighborhoods classified in each category. Reviewing the plot I noted that:
There are a small number of categories that a rating can fall into (A-E), and the ratings are based on simplified and abstracted representations of the ONS green space data (i.e. the green space scores shown as described in the introduction section above ). So, I do not think there is much value at this stage in considering descriptive statistics which summarise the distribution of green space deprivation ratings. At some point in might worth considering how the simplifications/abstractions used have affect the distributions of the ONS green space data, but I leave this to one side for now.
I am interested in how understanding in more detail the numbers and proportions of both neighborhoods and the population which are impacted by green space deprivation. To this end I produced the table below. The table separates the ratings in to two groups:
D and E: The neighborhoods rated as suffering the most extensive green space deprivation, and which FoE identify as in need of urgent action to improve access to green space.
A, B and C: Neighborhoods which FoE identify as in of action to protect green space and access to it, it the context of the long term trends which are reducing access to green space.
Reviewing the table I noted:
That around 30% of neighborhoods are rated either D or E, with around 17.84 million people (or 1 in 3 people) living in these neighborhoods and experiencing considerable green space deprivation.
Around 70% of neighborhoods are rated either A, B or C. With 25% of neighborhoods rated C and at risk of falling into considerable green space deprivation (i.e. ratings A and B) if green space and access to it is not protected over coming decades.
The percentages of neighborhoods and percentages of population are very similar in each rating group. So, considering the proportions of neighborhoods gives a fair indication of the proportions of the population affected.
| Green Space Deprivation in England | ||||
|---|---|---|---|---|
| Understanding the scale of the problem | ||||
| Green Space Deprivation Rating |
Neighbourhoods | Population | ||
| Number | % | Millions | % | |
| Urgent action needed to improve access to green space | ||||
| E | 1108 | 16 | 9.62 | 18 |
| D | 955 | 14 | 8.21 | 15 |
| Total | 2,063 | 30 | 17.84 | 33 |
| Action needed to protect access to green space | ||||
| C | 1727 | 25 | 13.54 | 25 |
| B | 1360 | 20 | 10.77 | 20 |
| A | 1641 | 24 | 12.58 | 23 |
| Total | 4,728 | 70 | 36.89 | 67 |
| Source: Friends of the Earth | ||||
Having explored how many neighborhoods and people are affected by green space deprivation across England as a whole, I was interested to understand if communities and people in some regions are more affected than others. In turn this could provide an indication of where action to alleviate green space deprivation is most needed. The plot below shows for each region the numbers the numbers of neighborhoods receiving each green space deprivation rating. Reviewing the plot I noted:
I was interest to understand in little more detail where the neighborhoods receiving the highest green space deprivation ratings (D and E). Below I plotted how the proportions of neighborhoods receiving a given rating (in this case D or E) are distributed across the English regions. Doing this involved addressing the challenge of how to ensure the colour associated with a given region was applied consistently across the two plots. This blog on How to map a colour to a value of a categorical variable … was very helpful in addressing this challenge.
Reviewing the plot below I noted that:
Having explored how green space deprivation is distributed across the English regions, I was interested to dig a little deeper into the question of where (in general rather than geographic sense) green space deprivation is a problem. In particular, it seems to intuitively make sense that green space deprivation is primarily an urban problem. I wanted to see if this intuition is born out by the data.
This involved finding a dataset which classified MSOAs (i.e. neighborhoods) by whether or not they can be considered urban. Find the appropriate ONS dataset took a little time and effort, but in the end I found an urban-rural classification conducted in 2001. Obviously it is not ideal to use a twenty year old data, when over that period it is likely that some rural MSOAs on the edges of urban areas will have become more developed. There was a more recent urban-classification conducted but the results do not appear to have been released as open data (the result are displayed on a web-GIS).
The table below shows the breakdown of neighborhoods by both green space deprivation rating and the type of neighborhood as defined in the ONS dataset. With neighborhoods being classified into one of three categories: (1) urban > 10k; (2) town and fringe; and, Village Hamlet & Isolated Dwellings. I also included an additional category for neighborhoods where it was not possible to identify an urban-rural classification (see column NA_). The percentages in the table sum to 100% column-wise. That is to say that the percentages show how the neighborhoods with each urban-rural classification breakdown over the five green space deprivation ratings.
Reviewing the table I noted that:
NA_ column. In other words for these MSOAs it wasn’t possible to identify if they were urban or rural. This isn’t ideal, but I decide to omit MSOAs where no urban-rural classification is available from the analysis conducted in this section of the notebook. Having looked at the names of the MSOAs appearing in the NA_ column I think it would be possible to conduct some name matching between the datasets to impute at least some of the missing urban-rural classification values. However, at this stage I am not sure if this is worth the effort.| green_space_deprivation_rating | Town and Fringe | Urban > 10K | Village Hamlet & Isolated Dwellings | NA_ |
|---|---|---|---|---|
| A | 51% (309) | 14% (747) | 84% (566) | 13% (19) |
| B | 7% (41) | 24% (1287) | 0% (0) | 21% (32) |
| C | 41% (248) | 25% (1345) | 16% (110) | 16% (24) |
| D | 1% (6) | 17% (923) | 0% (0) | 17% (26) |
| E | 0% (2) | 20% (1056) | 0% (0) | 33% (50) |
| Total | 100% (606) | 100% (5358) | 100% (676) | 100% (151) |
Reviewing the visual representation of the relationship - between green space deprivation rating and urban-rural classification of each neighborhood - below highlighted the following points.
Having focused on the green space deprivation rating themselves so far, I was interested to understand more about the data that informed these ratings. The ratings are calculated using three green space scores (each ranging from 1 to 4), see page 36 of report for more details. In turn each of these scores was calculated based on (what I have called) summary variables:
garden_area_per_capita in the dataset;green_space_area_per_capita in the dataset;pcnt_pop_with_go_space_access in the dataset. For this variable FoE considered only public green spaces of two or more hectacres in size. I am unsure how FoE calculated values for this variable, I assume some form of GIS analysis was involved. The definition of this variable is based on a Natural England standard. In turn this standard is based on research indicating that people access green space within five minutes walk considerable more frequently that green space beyond five minutes walk.In this section of the notebook I explore the distributions of, and correlations between, these three summary variables. Throughout this exploration I am seeking to better understand what green space deprivation looks like in England, and how this help me to better understand the FoE green space ratings. Which, in turn, I hope will inform my thinking about how to use statistical methods (e.g. k-means or k-medians clustering) to identify clusters of neighborhoods with similar green space characteristics. In this section I do however overlook for now the green space score variables, which FoE calculated from the summary variables. As each of the scores is essential a simplified version of one of the three summary variables.
Before looking at each of the summary variables in more detail I take a very quick look at some descriptive statistics. Reviewing the table below, I note that:
garden_area_per_capita and green_space_area_per_capita are likely to be extremely right skewed. For example, the median value for green_space_area_per_capitais approximately 19m2 per person, the 75th percentile is approximately 48 and the maximum is over 100,000. So, when plotting these variables below it will probably be necessary to transform axis scales and/or omit very high values for green space areas from plots.pcnt_pop_with_go_space_access as it is a percentage and so all values neccesarily lie in the range of 0 to 100.